Inference for epidemic models with time‐varying infection rates: Tracking the dynamics of oak processionary moth in the UK

Abstract Invasive pests pose a great threat to forest, woodland, and urban tree ecosystems. The oak processionary moth (OPM) is a destructive pest of oak trees, first reported in the UK in 2006. Despite great efforts to contain the outbreak within the original infested area of South‐East England, OPM continues to spread. Here, we analyze data consisting of the numbers of OPM nests removed each year from two parks in London between 2013 and 2020. Using a state‐of‐the‐art Bayesian inference scheme, we estimate the parameters for a stochastic compartmental SIR (susceptible, infested, and removed) model with a time‐varying infestation rate to describe the spread of OPM. We find that the infestation rate and subsequent basic reproduction number have remained constant since 2013 (with R0 between one and two). This shows further controls must be taken to reduce R0 below one and stop the advance of OPM into other areas of England. Synthesis. Our findings demonstrate the applicability of the SIR model to describing OPM spread and show that further controls are needed to reduce the infestation rate. The proposed statistical methodology is a powerful tool to explore the nature of a time‐varying infestation rate, applicable to other partially observed time series epidemic data.

In this paper, we use data tracking the numbers and locations of OPM nests removed from oak trees as part of a control program in two parks in London. We illustrate the use of statistical inference techniques for estimating the parameters for a classic SIR compartmental model (Bartlett, 1949;Kermack & McKendrick, 1927) consisting of susceptible, infested, and removed states. To allow for intrinsic stochasticity in the spread of OPM, we use an Itô stochastic differential equation (Oksendal, 1995) representation of the SIR model. This is further modified via the introduction of a time-varying infestation rate, as is necessary to capture the effect of unknown influences such as preventative measures (Dureau et al., 2013).
Bayesian inference for the resulting model is complicated by the intractability of the observed data likelihood, and subsequently, the joint posterior distribution of the key quantities of interest (model parameters and dynamic components). We overcome these difficulties via a linear Gaussian approximation of the stochastic SIR model, coupled with a Markov chain Monte Carlo scheme (Fearnhead et al., 2014) for generating samples from the joint posterior. These methods are outlined in Section 2 and detailed in Appendix S1, Sections S1 and S2, for use as a toolbox to apply to other ecological datasets.
We use the parameters from the compartmental model to estimate a yearly R 0 measure for OPM, analogous to the basic reproduction number for a pathogen (Heesterbeek & Dietz, 1996), and estimate the OPM population in 2021.

| ME THODS
In this section, we present the observational time series data with a summary of the data collection methods (Section 2.1), the details of the stochastic SIR model (Section 2.2), and an outline of our statistical inference methods (Section 2.3). Further statistical details including the relevant algorithms are set out in Appendix S1 (Sections S1 and S2).

| Data
The data in this paper are from Richmond and Bushy Park, collected and processed by The Royal Parks. This is shared with the Forestry Commission on an annual basis to inform the national Oak Processionary Moth Programme (Contingency Plan, 2021). The University of Southampton (GeoData) provide analysis, support, and hold the program data on behalf of the Forestry Commission.
The data used in this study were obtained through the recording of OPM nest removals in Richmond and Bushy Parks in The raw and cumulative time series of the numbers of removed nests are shown in Figure 2(a,b). We count each tree in the year it first had nests removed as one "removal" in the SIR model (see Section 2.2), regardless of how many nests were recorded as removed from this location. The raw and cumulative time series for the number of unique trees which had nests removed are shown in Figure 2(c,d). We use the latter cumulative time series, R(t), as our observational data in the following sections.

| Stochastic SIR model
We consider a SIR model (Andersson & Britton, 2000;Kermack & McKendrick, 1927) in which a population of trees of fixed size N is classified into compartments consisting of susceptible (S), infested (I ), and removed (R) trees. Although most commonly used in epidemiology, SIR models have previously been used to describe the spread of tree diseases (Parry et al., 2014;Rodriguez-Quinones & Gordillo, 2019) and invasive species through varying landscapes (Ferrari et al., 2014;Wildemeersch et al., 2019). In our case, susceptible trees are those that have yet to ever be infested with OPM nests and are at risk from the currently infested trees. The time series data we use (see Section 2.1) are observations of the removed category, trees that have previously been infested with OPM nests and have now had these nests removed. A fixed population of trees is appropriate as over the timescale of interest the number of trees born into the S compartment will be sufficiently small to be negligible. Transitions between compartments can be summarized via two dynamics, as detailed in Appendix S1, Section S1 (Ho et al., 2018).
We eschew the MJP formalism in favor of a continuous-valued approximation, formulated as a stochastic differential equation (SDE). This is a pragmatic choice, since the SDE model ultimately leads to a computationally efficient inference scheme, and the model can be easily augmented with additional components, such as time-varying parameters, which we now describe.
The SDE representation of the standard SIR model can be derived directly from the MJP (see Appendix S1, Section S1). Here, we extend this to include a time-varying infestation process. Let X t = (S t , I t ,̃ t ) � where S t and I t denote the numbers in each of the states Sand Iat time t ≥ 0and ̃ t = log t is the (transformed) timevarying infestation process. Note that the fixed population size gives R t = N − S t − I t for all t ≥ 0so that the current state of the SIR model is completely described by X t . We model ̃ t as a generalized Brownian motion process so that and W 3,t is a standard one-dimensional Brownian motion process.
Hence, we assume that the log infestation rate evolves according to a random walk in continuous time, with variability controlled by .

Combining this process with component SDEs describing the dynam-
ics of S t and I t gives the complete SDE description of the SIR model with

time-varying infestation rate as
Here, x t = (s t , i t ,̃ t ) � is the state of the system at time t, θ = (γ, σ)′ is the vector of static parameter values, W t = (W 1,t , W 2,t , W 3,t )′ is a vector of uncorrelated standard Brownian motion processes, and the drift function a(x t , )and diffusion coefficient b(x t , )are given by Unfortunately, due to the nonlinear forms of a(x t , ) and b(x t , ) , the SDE specified by (1) and (2) cannot be solved analytically. We, therefore, replace the intractable analytic solution with a tractable Gaussian process approximation, which is the subject of the next section. The resulting linear noise approximation is subsequently used as the inferential model.

| Linear noise approximation
The linear noise approximation (LNA) provides a tractable approximation to the SDE given by (1) and (2). In what follows we give a brief derivation; formal details can be found in Kurtz, (1972) (see also Kampen, 2001;Komorowski et al., 2009).
(3) X t = t + Z t , F I G U R E 2 (a) Raw and (b) cumulative number of OPM nests removed from Richmond (blue) and Bushy (orange dashed) parks between 2013 and 2021. The number of (c) raw and (d) cumulative unique trees (described by their eastings and northings) which had nests removed between 2013 and 2021. The cumulative number of trees is the time series R(t) corresponding to the "removed" category in the SIR model (see Section 2.2) and {Z t , t ≥ 0} is a residual stochastic process. The residual process Z t satisfies which will typically be intractable. The assumption that ||X t − t || is "small" motivates a Taylor series expansion of a(x t , ) and b(x t , ) about t , with retention of the first two terms in the expansion of a and the first term in the expansion of b. This gives an approximate residual pro- where H t is the Jacobian matrix with (i, j)th element For the SIR model in (1) and (2), we therefore have Given an initial condition Ẑ 0 ∼ N(ẑ 0 ,V 0 ), it can be shown that Ẑ t is a Gaussian random variable (see Fearnhead et al., 2014). Consequently, the partition in (3) with Z t replaced by Ẑ t , and the initial conditions 0 = x 0 and Ẑ 0 = 0 give where t satisfies (4) and V t satisfies Further details on the derivation of (6) are given in Appendix S1, Section S1. Hence, the linear noise approximation is characterized by the Gaussian distribution in (5), with mean and variance found by solving the ODE (ordinary differential equation) system given by (4) and (6). Although this ODE system will typically be intractable, a numerical scheme can be straightforwardly applied.

| Bayesian inference
We consider the case in which not all components of the stochastic epidemic model are observed. Moreover, we assume that data points are subject to measurement error, which accounts for mismatch between the latent and observed process, due to, for example, the way in which the data are collected. Observations (on a regular grid) y t , t = 0, 1, ⋯n are assumed conditionally independent (given the latent process) with conditional probability distribution obtained via the observation equation, where P = (1, 1, 0)′. This choice of P is due to the data consisting of observations on the removed state R t , which, for a known population size N, is equivalent (in information content) to observing the sum S t + I t . Note that the logarithm of the infestation rate process, ̃ t , is Given data y = (y 0 , y 1 , ⋯, y n ) and upon ascribing a prior density Given inferences on the static parameters and the latent dynamic process x, we consider the following diagnostics for assessing model fit. The within-sample predictive density is and the one step ahead out of sample predictive density is Hence, in both cases we properly account for parameter and latent process uncertainty. Although the densities in (9) and (10) will be intractable, we may generate samples via Monte Carlo, see Appendix S1 Section S2 for further details.
We assume the epidemic time series (see Section 2.1) for the cumulative number of trees with removed nests, R(t), shown in Figure 2(c), can be described by the compartmental SIR model with a timevarying infestation rate (see Section 2.2). The aim is to estimate the key parameters through the Bayesian inference techniques described in Section 2.3. These are the time-varying infestation rate per tree, (t), with corresponding stochastic noise parameter describing d̃ t = dlog( t ) = W 3,t , the removal rate , and the observation error e .
Section S2.4 of Appendix S1 provides details of the assumed population sizes for each park, initial numbers for the S (susceptible) and I (currently infected) tree categories and the initial infestation rate, along with the starting parameter values for the MCMC scheme and prior specification. Regarding the latter, we take an independent prior specification for the components of , so that ( ) = ( ) ( ) ( e ). We then take lognormal LN(1, 1) distributions for and e , and a lognormal LN (0, 0. 5 2 ) distribution for . We assume that initial log infestation rate ̃ 0 follows a Gaussian N ( − 8.5, 0.

| Inference results
We ran the MCMC scheme for 10 × 10 3 iterations and monitored the resulting chains for convergence. Indicative trace plots can be found in Appendix S1, Figure S1 and suggest that the sampler has adequately explored the parameter space. Additional chains initialized at different starting values (not shown) further confirm convergence.

| Estimation of R 0
From the posterior estimations of ̃ t for each year and the parameter , we can estimate the basic reproduction number R 0 . This gives a measure of the strength of infectivity through the number of trees which become infested as a result of a single infested tree over its infested life time (i.e., the expected number of secondary infestations resulting from a single original infestation). This provides additional information over the parameter , since R 0 takes into account the lifetime in which a tree can infest another tree (i.e., the removal rate ). In a deterministic system, for an epidemic to die out, R 0 must be less than the threshold value of one. However, in the stochastic case, it is possible for R 0 to be above one but the epidemic still die out as a result of the stochastic fluctuations. Therefore, it is required that (corresponding to the relatively constant t ). However, this suggests that R 0 is still above one, and therefore, the epidemic will continue to propagate in these areas and potentially beyond.

| Forward prediction
Predictions of the spread of OPM are needed to inform control strategies. To test the validity of the SIR model with the inferred parameters from Section 3.1 and thus how well the model can capture future expansions in OPM, we can calculate a one-year prediction for a known data point. We remove the last data point, R(2020), and re-infer the parameters for the new shortened observed time series.
We then use these parameters to run the model forward (

| DISCUSS ION
Recent modeling work has suggested that the surroundings of the current OPM infestation area in the UK are highly climatically suitable and, therefore, at very high risk from future infestations (Godefroid et al., 2020). Since the government strategy for the containment of OPM relies on targeted control at the boundary of the current infested area, it is crucial to understand and be able to predict the future spread to optimize both the cost and efficacy of these control programs (Contingency Plan, 2021).
We have shown the applicability of a SIR compartmental model with a time-varying infestation rate to describe the OPM epidemic in the UK between the years 2013 and 2020. Such models have previously been used to describe the spread of tree diseases (Rodriguez-Quinones & Gordillo, 2019) and invasive species (Ferrari et al., 2014;Wildemeersch et al., 2019). The statistical methodology used is a powerful tool for inferring the parameters of such models from real data and is transferable to other epidemiological and ecological datasets. Previously, similar statistical methodology has been used to describe the spread of infectious diseases (e.g., measles (Cauchemez & Ferguson, 2008) and Ebola (Fintzi et al., 2020)) and the spatial expansion of non-native plants (Cook et al., 2007), but has not yet been applied to the study of invasive insects.
Our results show, along with previous analysis (Suprunenko et al., 2021), that the spread of OPM is continuing at a stable rate despite the current intervention methods. Correspondingly, we show that the basic reproduction number R 0 has been above one since 2013. To see a reduction in the OPM population density and to protect the surrounding areas, a reduction of R 0 to below one would need to be seen.
Although the basic reproduction number R 0 is typically used in the modeling of infectious diseases (Dietz, 1993;Ma, 2020), here it gives an analogous measure for the expected number of infested trees caused by a single currently infested tree through its infested lifetime.
Driven by the nature of the data collected, we chose to make the assumption that the trees with removed nests best represented the removed category in the SIR model. Although not explicitly formulated in this model, we expect that after nest removal and spraying with a biological insecticide, these trees will not be susceptible to future infestation on the short time scales considered here. This limitation of the model could be explored further through assuming this data instead represented the infested category (with the caveat that these trees would not actually be infective to others at the times the data were collected) or by extending the model to an SIRS formulation, which would allow for re-infestation after a period of immunity.
For simplicity and to be better described by a SIR model, we counted each tree that had undergone nest removal as one removed tree, regardless of how many nests were recorded as being removed from it. However, the defoliation effects and risks to human health from OPM are closely related to nest density (i.e., the numbers of nests per tree) (Jactel et al., 2011). In future work, nest density could be taken into account through a nest density-dependent infestation rate.

A challenge of modeling OPM and other tree pests and diseases
is the lack of a complete inventory oak trees in the UK, representing the susceptible population in our SIR model. This has been previously noted and highlighted as a priority for future data collection by other modeling studies (Cowley et al., 2015). It is of particular importance for future spatial models of OPM, which require an estimate of the distribution of oak trees in the areas of interest. park. The shaded area shows the 50% credible region. The orange line shows the observed data up to 2020 used to parameterize the model, and the red cross shows the most recent data for 2021